Data Warehouse Design: Modern Principles and Methodologies by Matteo Golfarelli & Stefano Rizzi
Author:Matteo Golfarelli & Stefano Rizzi
Language: eng
Format: epub
Publisher: McGraw-Hill
Published: 2009-03-14T16:00:00+00:00
The problem in abstract terms justifying this formula can be explained as follows: Given a box with an infinite number of balls of n colors and each color repeating for an infinite number of times, how many different colors are there when you draw m balls?
As a matter of fact, we are aware that the Cardenas formula tends to overestimate. For this reason, it can happen that the space actually used by views is by far smaller than the space allocated in the view materialization phase. To solve this problem, Ciaccia and others (2003) introduced an approach using cardinality constraints from application domains to define upper and lower bounds of every group-by set cardinality. On the basis of those bounds, you use probability formulas to calculate the expected cardinality. Domain experts define cardinality constraints as the cardinality of one or more group-by sets or k-dependencies limiting the value of the relationship between the cardinalities of two group-by sets. To realize how those bounds can improve a cardinality estimate, consider the following example. An enterprise wants to monitor its employees transferred from one department to another. See Figure 6-30, which shows the reference fact schema. The primary group-by set is G0 = {date, fromDepartment, toDepartment, employee}. The designer wants to evaluate the cardinality of G= {date, fromDepartment, toDepartment}. The value 104 represents the number of employees transferred from a department to another one at least once in a 103-day-long monitoring period, and 103 is the number of departments in place. If there is no additional data, you can only state that the cardinality value of G has to be greater than 103 and lower than 103 × 103 × 103 = 109 at the same time, because, at the utmost, each department can be involved in a personnel transfer to all the other departments every day. If your domain expert lets you know that an employee can be transferred to other departments twice in a year at the utmost, and your monitoring period consists of approximately three years, then you can calculate that the cardinality value of G0 cannot be greater than 6 times the total number of employees or 6 × 104. For this reason, it is clear that the maximum estimate of G can be adjusted, because the cardinality value of a secondary group-by set cannot be greater than the cardinality value of a primary group-by set—secondary events are an aggregation of primary ones. To conclude, the result will be 104 ≤ card(G) ≤ 6 ×104. On the basis of the bounds created, you can adopt probability-driven approaches, such as the Cardenas formula, to estimate the cardinality value of G reliably.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
NET | C & C++ Windows Programming |
SQL Server | VBA |
Visual Basic |
Deep Learning with Python by François Chollet(12589)
Hello! Python by Anthony Briggs(9926)
OCA Java SE 8 Programmer I Certification Guide by Mala Gupta(9801)
The Mikado Method by Ola Ellnestam Daniel Brolund(9786)
Dependency Injection in .NET by Mark Seemann(9348)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8309)
Test-Driven iOS Development with Swift 4 by Dominik Hauser(7771)
Grails in Action by Glen Smith Peter Ledbrook(7705)
The Well-Grounded Java Developer by Benjamin J. Evans Martijn Verburg(7566)
Becoming a Dynamics 365 Finance and Supply Chain Solution Architect by Brent Dawson(7158)
Microservices with Go by Alexander Shuiskov(6925)
Practical Design Patterns for Java Developers by Miroslav Wengner(6841)
Test Automation Engineering Handbook by Manikandan Sambamurthy(6783)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(6423)
Angular Projects - Third Edition by Aristeidis Bampakos(6202)
The Art of Crafting User Stories by The Art of Crafting User Stories(5717)
NetSuite for Consultants - Second Edition by Peter Ries(5649)
Demystifying Cryptography with OpenSSL 3.0 by Alexei Khlebnikov(5461)
Kotlin in Action by Dmitry Jemerov(5073)
